Search CORE

27 research outputs found

Data and Text Mining Techniques for In-Domain and Cross-Domain Applications

Author: Domeniconi Giacomo <1986>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 12/05/2016
Field of study

In the big data era, a wide amount of data has been generated in different domains, from social media to news feeds, from health care to genomic functionalities. When addressing a problem, we usually need to harness multiple disparate datasets. Data from different domains may follow different modalities, each of which has a different representation, distribution, scale and density. For example, text is usually represented as discrete sparse word count vectors, whereas an image is represented by pixel intensities, and so on. Nowadays plenty of Data Mining and Machine Learning techniques are proposed in literature, which have already achieved significant success in many knowledge engineering areas, including classification, regression and clustering. Anyway some challenging issues remain when tackling a new problem: how to represent the problem? What approach is better to use among the huge quantity of possibilities? What is the information to be used in the Machine Learning task and how to represent it? There exist any different domains from which borrow knowledge? This dissertation proposes some possible representation approaches for problems in different domains, from text mining to genomic analysis. In particular, one of the major contributions is a different way to represent a classical classification problem: instead of using an instance related to each object (a document, or a gene, or a social post, etc.) to be classified, it is proposed to use a pair of objects or a pair object-class, using the relationship between them as label. The application of this approach is tested on both flat and hierarchical text categorization datasets, where it potentially allows the efficient addition of new categories during classification. Furthermore, the same idea is used to extract conversational threads from an unregulated pool of messages and also to classify the biomedical literature based on the genomic features treated

AMS Tesi di Dottorato

On Deep Learning in Cross-Domain Sentiment Classification

Author: Andrea Pagliarani
Giacomo Domeniconi
Gianluca Moro
Roberto Pasolini
Publication venue: country:PRT
Publication date: 01/01/2017
Field of study

Cross-domain sentiment classification consists in distinguishing positive and negative reviews of a target domain by using knowledge extracted and transferred from a heterogeneous source domain. Cross-domain solutions aim at overcoming the costly pre-classification of each new training set by human experts. Despite the potential business relevance of this research thread, the existing ad hoc solutions are still not scalable with real large text sets. Scalable Deep Learning techniques have been effectively applied to in-domain text classification, by training and categorising documents belonging to the same domain. This work analyses the cross-domain efficacy of a well-known unsupervised Deep Learning approach for text mining, called Paragraph Vector, comparing its performance with a method based on Markov Chain developed ad hoc for cross-domain sentiment classification. The experiments show that, once enough data is available for training, Paragraph Vector achieves accuracy equiva lent to Markov Chain both in-domain and cross-domain, despite no explicit transfer learning capability. The outcome suggests that combining Deep Learning with transfer learning techniques could be a breakthrough of ad hoc cross-domain sentiment solutions in big data scenarios. This opinion is confirmed by a really simple multi-source experiment we tried to improve transfer learning, which increases the accuracy of cross-domain sentiment classification

ZENODO

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Learning to Predict the Stock Market Dow Jones Index Detecting and Mining Relevant Tweets

Author: Andrea Pagliarani
Giacomo Domeniconi
Gianluca Moro
Roberto Pasolini
Publication venue: country:PRT
Publication date: 01/01/2017
Field of study

Stock market analysis is a primary interest for finance and such a challenging task that has always attracted many researchers. Historically, this task was accomplished by means of trend analysis, but in the last years text mining is emerging as a promising way to predict the stock price movements. Indeed, previous works showed not only a strong correlation between financial news and their impacts to the movements of stock prices, but also that the analysis of social network posts can help to predict them. These latest methods are mainly based on complex techniques to extract the semantic content and/or the sentiment of the social network posts. Differently, in this paper we describe a method to predict the Dow Jones Industrial Average (DJIA) price movements based on simpler mining techniques and text similarity measures, in order to detect and characterise relevant tweets that lead to increments and decrements of DJIA. Considering the high level of noise in the social network data, w e also introduce a noise detection method based on a two steps classification. We tested our method on 10 millions twitter posts spanning one year, achieving an accuracy of 88.9% in the Dow Jones daily prediction, that is, to the best our knowledge, the best result in the literature approaches based on social networks

ZENODO

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

Author: Chen Jie
Domeniconi Giacomo
Kaler Tim
Kanezashi Hiroki
Leiserson Charles E.
Ma Tengfei
Pareja Aldo
Schardl Tao B.
Suzumura Toyotaro
Publication venue
Publication date: 18/11/2019
Field of study

Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. Existing approaches typically resort to node embeddings and use a recurrent neural network (RNN, broadly speaking) to regulate the embeddings and learn the temporal dynamics. These methods require the knowledge of a node in the full time span (including both training and testing) and are less applicable to the frequent change of the node set. In some extreme scenarios, the node sets at different time steps may completely differ. To resolve this challenge, we propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings. The proposed approach captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters. Two architectures are considered for the parameter evolution. We evaluate the proposed approach on tasks including link prediction, edge classification, and node classification. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. The code is available at \url{https://github.com/IBM/EvolveGCN}.Comment: AAAI 2020. The code is available at https://github.com/IBM/EvolveGC

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

GOTA: GO term annotation of biomedical literature

Author: A Doms
A Schlicker
A Singhal
C Blaschke
D Li
DL Rubin
G Salton
Giacomo Domeniconi
Gianluca Moro
J Gobeill
J Gobeill
J Lomax
J Rousu
K Verspoor
L Du Plessis
L Hirschman
Luciano Margara
M Ashburner
MF Porter
N Cesa-Bianchi
N Skunca
NR Silla
NS Altman
P Radivojac
Pietro Di Lena
SE Lewis
T Liu
TH Wonnacott
Y Mao
Y Tao
Z Barutcuoglu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Text mining gerarchico: classificazione semantica di documenti in tassonomie di argomenti

Author: Domeniconi Giacomo
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 26/07/2012
Field of study

AMS Tesi di Laurea